Parameter Optimization for Statistical Machine Translation: It Pays to Learn from Hard Examples
نویسندگان
چکیده
Research on statistical machine translation has focused on particular translation directions, typically with English as the target language, e.g., from Arabic to English. When we reverse the translation direction, the multiple reference translations turn into multiple possible inputs, which offers both challenges and opportunities. We propose and evaluate several strategies for making use of these multiple inputs: (a) select one of the datasets, (b) select the best input for each sentence, and (c) synthesize an input for each sentence by fusing the available inputs. Surprisingly, we find out that it is best to tune on the hardest available input, not on the one that yields the highest BLEU score. This finding has implications on how to pick good translators and how to select useful data for parameter optimization in SMT.
منابع مشابه
Extending Anchored Learning to Machine Translation Evaluation
We compare two methods, Anchored Learning, and a simple new method (Hyperplane Distance), for finding Hard To Learn examples in Machine Learning tasks that use SVMs. These Hard To Learn examples can be corpus errors, or examples which are difficult to predict with the given set of features. The Anchored Learning method is extended to deal with various kernels and SV regression. A number of expe...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملTowards a Systematic and Human-Informed Paradigm for High-Quality Machine Translation
Since the advent of modern statistical machine translation (SMT), much progress in system performance has been achieved that went hand-in-hand with ever more sophisticated mathematical models and methods. Numerous small improvements have been reported whose lasting effects are hard to judge, especially when they are combined with other newly proposed modifications of the basic models. Often the...
متن کاملSIZE AND GEOMETRY OPTIMIZATION OF TRUSSES USING TEACHING-LEARNING-BASED OPTIMIZATION
A novel optimization algorithm named teaching-learning-based optimization (TLBO) algorithm and its implementation procedure were presented in this paper. TLBO is a meta-heuristic method, which simulates the phenomenon in classes. TLBO has two phases: teacher phase and learner phase. Students learn from teachers in teacher phases and obtain knowledge by mutual learning in learner phase. The suit...
متن کاملSpeed-Constrained Tuning for Statistical Machine Translation Using Bayesian Optimization
We address the problem of automatically finding the parameters of a statistical machine translation system that maximize BLEU scores while ensuring that decoding speed exceeds a minimum value. We propose the use of Bayesian Optimization to efficiently tune the speed-related decoding parameters by easily incorporating speed as a noisy constraint function. The obtained parameter values are guaran...
متن کامل